Search CORE

28 research outputs found

Recent Evolutions of Multiple Sequence Alignment Algorithms

Author: Cédric Notredame
Fran Lewitter
Publication venue: Public Library of Science
Publication date: 01/08/2007
Field of study

Crossref

Directory of Open Access Journals

PubMed Central

R-Coffee: a method for multiple alignment of non-coding RNA

Author: Higgins Desmond G.
Notredame Cédric
Wilm Andreas
Publication venue: Oxford University Press
Publication date
Field of study

R-Coffee is a multiple RNA alignment package, derived from T-Coffee, designed to align RNA sequences while exploiting secondary structure information. R-Coffee uses an alignment-scoring scheme that incorporates secondary structure information within the alignment. It works particularly well as an alignment improver and can be combined with any existing sequence alignment method. In this work, we used R-Coffee to compute multiple sequence alignments combining the pairwise output of sequence aligners and structural aligners. We show that R-Coffee can improve the accuracy of all the sequence aligners. We also show that the consistency-based component of T-Coffee can improve the accuracy of several structural aligners. R-Coffee was tested on 388 BRAliBase reference datasets and on 11 longer Cmfinder datasets. Altogether our results suggest that the best protocol for aligning short sequences (less than 200 nt) is the combination of R-Coffee with the RNA pairwise structural aligner Consan. We also show that the simultaneous combination of the four best sequence alignment programs with R-Coffee produces alignments almost as accurate as those obtained with R-Coffee/Consan. Finally, we show that R-Coffee can also be used to align longer datasets beyond the usual scope of structural aligners. R-Coffee is freely available for download, along with documentation, from the T-Coffee web site (www.tcoffee.org)

CiteSeerX

Crossref

PubMed Central

R-Coffee: a web server for accurately aligning noncoding RNA sequences

Author: Higgins Desmond G.
Moretti Sébastien
Notredame Cédric
Wilm Andreas
Xenarios Ioannis
Publication venue
Publication date: 02/08/2017
Field of study

The R-Coffee web server produces highly accurate multiple alignments of noncoding RNA (ncRNA) sequences, taking into account predicted secondary structures. R-Coffee uses a novel algorithm recently incorporated in the T-Coffee package. R-Coffee works along the same lines as T-Coffee: it uses pairwise or multiple sequence alignment (MSA) methods to compute a primary library of input alignments. The program then computes an MSA highly consistent with both the alignments contained in the library and the secondary structures associated with the sequences. The secondary structures are predicted using RNAplfold. The server provides two modes. The slow/accurate mode is restricted to small datasets (less than 5 sequences less than 150 nucleotides) and combines R-Coffee with Consan, a very accurate pairwise RNA alignment method. For larger datasets a fast method can be used (RM-Coffee mode), that uses R-Coffee to combine the output of the three packages which combines the outputs from programs found to perform best on RNA (MUSCLE, MAFFT and ProbConsRNA). Our BRAliBase benchmarks indicate that the R-Coffee/Consan combination is one of the best ncRNA alignment methods for short sequences, while the RM-Coffee gives comparable results on longer sequences. The R-Coffee web server is available at http://www.tcoffee.or

RERO DOC Digital Library

Vertebrate conserved non coding DNA regions have a high persistence length and a short persistence time

Author: Beaudoing Emmanuel
Bucher Philipp
Jongeneel C Victor
Notredame Cédric
Retelska Dorota
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: The comparison of complete genomes has revealed surprisingly large numbers of conserved non-protein-coding (CNC) DNA regions. However, the biological function of CNC remains elusive. CNC differ in two aspects from conserved protein-coding regions. They are not conserved across phylum boundaries, and they do not contain readily detectable sub-domains. Here we characterize the persistence length and time of CNC and conserved protein-coding regions in the vertebrate and insect lineages. RESULTS: The persistence length is the length of a genome region over which a certain level of sequence identity is consistently maintained. The persistence time is the evolutionary period during which a conserved region evolves under the same selective constraints.Our main findings are: (i) Insect genomes contain 1.60 times less conserved information than vertebrates; (ii) Vertebrate CNC have a higher persistence length than conserved coding regions or insect CNC; (iii) CNC have shorter persistence times as compared to conserved coding regions in both lineages. CONCLUSION: Higher persistence length of vertebrate CNC indicates that the conserved information in vertebrates and insects is organized in functional elements of different lengths. These findings might be related to the higher morphological complexity of vertebrates and give clues about the structure of active CNC elements.Shorter persistence time might explain the previously puzzling observations of highly conserved CNC within each phylum, and of a lack of conservation between phyla. It suggests that CNC divergence might be a key factor in vertebrate evolution. Further evolutionary studies will help to relate individual CNC to specific developmental processes

Infoscience - École polytechnique fédérale de Lausanne

Springer - Publisher Connector

Serveur académique lausannois

Directory of Open Access Journals

PubMed Central

PROTOGENE: turning amino acid alignments into bona fide CDS nucleotide alignments

Author: Armougom Fabrice
Audic Stéphane
Keduas Vladimir
Moretti Sébastien
Notredame Cédric
Poirot Olivier
Reinier Frédéric
Publication venue: Oxford University Press
Publication date: 01/01/2006
Field of study

We describe Protogene, a server that can turn a protein multiple sequence alignment into the equivalent alignment of the original gene coding DNA. Protogene relies on a pipeline where every initial protein sequence is BLASTed against RefSeq or NR. The annotation associated with potential matches is used to identify the gene sequence. This gene sequence is then aligned with the query protein using Exonerate in order to extract a coding nucleotide sequence matching the original protein. Protogene can handle protein fragments and will return every CDS coding for a given protein, even if they occur in different genomes. Protogene is available from

CiteSeerX

Crossref

PubMed Central

Use of ChIP-Seq data for the design of a multiple promoter-alignment method

Author: Altenhoff
Althammer
Aniba
Bais
Berezikov
Blanco
Blanco
Bulyk
Bussotti
Carroll
Chenna
Cédric Notredame
Do
Edgar
Eduardo Eyras
Enrique Blanco
Farnham
Flicek
Giovanni Bussotti
Hallikas
He
Henikoff
Hertz
Huang
Ionas Erb
Juan R. González-Vallinas
Katoh
Keightley
Kellis
Kemena
Kim
Kumar
Loots
Lu
Majoros
Matys
Moses
Notredame
Otto
Ovcharenko
Parker
Pollard
Portales-Casamar
Prohaska
Schmidt
Siddharthan
Siddharthan
Sinha
Su
Thompson
Xie
Zhang
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments

Crossref

PubMed Central

UPF Digital Repository

Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words

Author: AR Panchenko
B Morgenstern
B Rost
C Chothia
C Kemena
C Notredame
CB Do
Cédric Notredame
D Baker
DG Higgins
DT Jones
Eugene A. Permyakov
F Armougom
G Yona
GH Gonnet
H-N Lin
Hsin-Nan Lin
HY Zhou
HY Zhou
J Skolnick
J Soding
JD Thompson
Jia-Ming Chang
JM Pei
JM Pei
JM Pei
K Katoh
L Rychlewski
L Wang
LA Kelley
MJ Sternberg
MO Dayhoff
O O'Sullivan
P Hogeweg
R Hagopian
R Sadreyev
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Edgar
RC Edgar
S Henikoff
SF Altschul
SF Altschul
T Hara
T Müller
Ting-Yi Sung
U Roshan
VA Simossis
W Kabsch
W Kabsch
Wen-Lian Hsu
Y Zhang
Y Zhang
Y Zhang
Publication venue: Public Library of Science
Publication date: 02/12/2011
Field of study

Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central